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Abstract — We consider the problem of optimally allocating 
a given total storage budget in a distributed storage system. 
A source has a data object which it can code and store over 
a set of storage nodes; it is allowed to store any amount 
of coded data in each node, as long as the total amount of 
storage used does not exceed the given budget. A data collector 
subsequently attempts to recover the original data object by 
accessing each of the nodes independently with some constant 
probability. By using an appropriate code, successful recovery 
occurs when the total amount of data in the accessed nodes 
is at least the size of the original data object. The goal is to 
find an optimal storage allocation that maximizes the probability 
of successful recovery. This optimization problem is challenging 
because of its discrete nature and nonconvexity, despite its 
simple formulation. Symmetric allocations (in which all nonempty 
nodes store the same amount of data), though intuitive, may 
be suboptimal; the problem is nontrivial even if we optimize 
over only symmetric allocations. Our main result shows that 
the symmetric allocation that spreads the budget maximally 
over all nodes is asymptotically optimal in a regime of interest. 
Specifically, we derive an upper bound for the suboptimality 
of this allocation and show that the performance gap vanishes 
asymptotically in the specified regime. Further, we explicitly find 
the optimal symmetric allocation for a variety of cases. Our 
results can be applied to distributed storage systems and other 
problems dealing with reliability under uncertainty, including 
delay tolerant networks (DTNs) and content delivery networks 
(CDNs). 

I. Introduction 

Consider a distributed storage system comprising n storage 
nodes. A source has a data object of unit size which is to be 
coded and stored in a distributed manner over these nodes; it 
could, for instance, split the data object into multiple chunks 
and then replicate them redundantly over the nodes. Let Xi 
be the amount of data stored in node i G {l,...,n}. Any 
amount of data may be stored in each node, as long as the 
total amount of storage used is at most a given budget T, that 
is, X]"=i ^i — ^- This is a realistic constraint if there is limited 
transmission bandwidth or storage space, or if it is too costly 
to mirror the data object in its entirety in every node. At some 
time after the creation of this coded storage, a data collector 
attempts to recover the original data object by accessing only 
the data stored in a random subset r of the nodes, where r is 



storage node 1 




Fig. 1. Information flows in a distributed storage system. The source s has 
a data object of unit size which it can code and store over n storage nodes. 
Subsequently, a data collector t attempts to recover the original data object 
by accessing each of the n nodes independently with probabiUty p. 



to be specified by the assumed access model or failure model 
(nodes or links may fail probabilistically, for example). 

By using a good coding scheme that enables successful 
recovery whenever the total amount of data accessed by the 
data collector is at least the size of the original data object, we 
can decouple the problems of (i) allocating the given budget 
among the nodes, that is, determining the values of a;i, . . . , x„, 
and (ii) designing a coding scheme for such an allocation. 
(This can be achieved with a suitable MDS code, or with ran- 
dom linear codes, for example.) Consequently, the probability 
of successful recovery for an allocation {xi, . . . , a;„} can be 
written as 



[successful recovery] = 



E-^ 



X, > 1 
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Our goal is to find an optimal allocation that maximizes this 
recovery probability, subject to the given budget constraint. 

In this paper, we assume a natural access model in which 
the data collector accesses each of the n storage nodes 
independently with probability p, as depicted in Fig. [T] In 
other words, each node i appears in subset r independently 
with probability p. The resulting problem can be interpreted 
as that of maximizing the reliability of data storage in a 
system where each node fails independently with probability 
(1— p). It turns out that this is a challenging nonconvex 
optimization problem, despite the simplicity of its formula- 
tion. The problem was investigated by several people at UC 
Berkeley fl], and has led to recent work on distributed storage 
allocation (see, for e.g., JlJ-llSl). The reader is encouraged 
to work out some small examples to understand where the 
complexity of the problem lies. One may expect to always 



find an optimal allocation that is symmetric, i.e. with all 
nonzero Xi being equal, but this intuition is incorrect. For 
instance, the following counterexample shows that symmetric 
allocations can be suboptimal: Given {n,p,T) ~ (5, |,|), 
the nonsymmetric allocation { | > | , ^ > ^ j I } yields a recov- 
ery probability of 0.90535, which is strictly greater than 
the recovery probabilities for the five symmetric allocations, 
of which ||,|, 0,0,0} and {1^,1^,1^,1^70} achieve the 
highest recovery probability of 0.88889. In this case, maximal 
spreading of the budget over all nodes, i.e. assigning Xi — — 
for all i, turns out to perform poorly, even though one may 
expect greater reliability from "spreading eggs over multiple 
baskets." 

Our Contribution: In this paper, we show that the intu- 
itive symmetric allocation that spreads the budget maximally 
over all nodes is indeed asymptotically optimal in a regime 
of interest. Specifically, we derive an upper bound for the 
suboptimality of this allocation, and show that the performance 
gap vanishes asymptotically as the total number of storage 
nodes n grows, when T > -. This is a regime of interest 
because a high probability of successful recovery is possible 



when T > i 



pT > 1: The expected total amount of data 



accessed by the data collector is given by 



E 



Y^x.Y, 



.i=l 



^a;,E[K,]=p^x, <pT, 



i=l 



4=1 



where Yi's are independent Bernoulli(p) random variables. 
Therefore, the data collector would be able to access a suffi- 
cient amount of data in expectation for successful recovery if 
pT > 1. In addition, we explicitly find the optimal symmetric 
allocation for a wide range of parameter values of p and T. 

Related Work: Jain et al. ||2l evaluated the performance of 
symmetric allocations experimentally in the context of routing 
in a delay tolerant network (DTN). The authors also presented 
an alternative formulation using Gaussian distributions to 
model partial access to nodes. Note that the related theoretical 
claims found in |2| and its associated technical report contain 
some proofs that are incomplete and partially inaccurate. In 
Il3]-f5l, a different access model was considered in which the 
data collector accesses a random fixed-size subset of nodes. 
Various storage allocation problems have also been studied in 
a nonprobabilistic setting, with the objective of minimizing the 
total storage budget required to satisfy a given set of recovery 
requirements in a network (see, for e.g., |J6l, Q). 

In the next section, we define the problem formally and 
state our main results, which are then proved in the following 
section. 

II. Problem Definition and Main Results 
We adopt the following notation throughout the paper: 

n total number of storage nodes, n>2 

p access probability, < p < 1 

Xi amount of data stored in storage node i, 

Xi > 0, where i e {1, . . . , n} 
T total storage budget, 1 < T < n 



Allocations are expressed as multisets, e.g. {1, 1, 0, 0}, and we 
write "B {n, p)" as shorthand for the binomial random variable 
with n trials and success probability p. 

We consider the storage allocation problem where the data 
collector accesses each of the n storage nodes independently 
with probability p\ successful recovery occurs iff the total 
amount of data stored in the accessed nodes is at least 1. We 
seek an optimal allocation {a;i, . . . , x„}, among all allocations 
of the budget T, that maximizes the probability of successful 
recovery for a given choice of n, p, and T. This optimization 
problem can be expressed as follows: 



n{n,p,T) 



maximize 

X^ ,. . . ,Xn 



J2 p'-'(i-pr"'""'-i 



reV{{l,...,n}) 






subject to 



1=1 

X, > V i G {!,..., n}, 

where V{S) denotes the power set of S, and I[G] = 1 if 
statement G is true, and otherwise. For the trivial budget 
T = 1, the optimal allocation is {1,0, ...,0}; for T = n, 
the optimal allocation is {1, . . . , 1}. The problem is difficult 
in general because the objective function is discrete and 
nonconvex, and there is a large space of feasible allocations 
to consider. 

Let x(n, T, m) be the symmetric allocation for n nodes 
that uses a total storage of T and contains exactly m e 
{1,2,..., n} nonempty nodes, that is, 

( T T 1 

x(n, T, to) = <^ —,...,— , 0,...,0 >. 
l^ m rn '-^.^^.^ J 

(n— T?i) terms 
m terms ^ 

Our first result bounds the suboptimality of the symmetric 
allocation x (n, T, m=n), and shows that its recovery proba- 
bility approaches that of an optimal allocation as n goes to 
infinity when T > -: 

Theorem 1. The gap between the probabilities of successful 
recovery for an optimal allocation and for the symmetric 
allocation x (n, T, m=n) is at most 

~ n' 
T 

Ifp and T are fixed such that T > -, then this gap approaches 
zero as n goes to infinity. 

The regime T > ^ is of interest because the recovery probabil- 
ity would be bounded away from 1 if T < - ■^=^ pT < 1 in- 
stead. This follows from the application of Markov's inequality 
to the random variable W denoting the total amount of data ac- 
cessed by the data collector, which gives P [W > 1] < E [W]. 
Since P [W > 1] is just the probability of successful recovery, 
and E [W] < pT as shown in the introduction, we have 

P [successful recovery] < pT. 



pT. 



B{n-l,p) < 
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Fig. 2. Plot of probability of successful recovery Ps against budget T 
for each symmetric allocation x(ra, T, m), for (n,p) = (20, 4). Parameter 
m denotes the number of nonempty nodes in the symmetric allocation. The 
black curve gives an upper bound for the recovery probability of an optimal 
allocation, as derived in Lemma [T] 




Fig. 3. Plot of access probability p against budget T, showing regions of 
(T, p) over which the sufficient conditions of the theorems are satisfied. The 
black dashed curve marks the points satisfying p = ^. "Maximal spreading" 
is optimal among symmetric allocations in the colored regions above the 
curve, while "minimal spreading" is optimal among symmetric allocations in 
the colored regions below the curve. 



The rest of our results deal with the optimization problem 
restricted over symmetric allocations. The problem appears 
nontrivial despite this simplification, as demonstrated by Fig.|2] 
which compares the performance of different symmetric allo- 
cations over different budgets, for a particular choice of n and 
p; the value of m corresponding to the optimal symmetric 
allocation can change drastically with varying budget. 

Fortunately, as we shall see in the following section, the re- 
covery probability for a symmetric allocation can be expressed 
as the tail probability of a binomial distribution. This facilitates 
analysis and enables us to provide a sufficient condition 
for "maximal spreading" to be optimal among symmetric 
allocations (Theorem |2|i, and for "minimal spreading" to be 
optimal among symmetric allocations (Theorem [3]l: 



Theorem 2. If T > 



then x(n,r,m=[[fjrj) 



or X (n, T, m=n) is an optimal symmetric allocation. 
Both candidate allocations are identical when y e Z+, 

i.e. ± /t, 2 1 3 , • . •• 



then li. (n, T, m— \T\ ) is an optimal 



Theorem 3.IfT< 
symmetric allocation 

Fig. [3] summarizes these theorems in the form of a region 
plot. Our results cover all choices of p and T except for 
the gap around p = ^, which diminishes with increasing 
T. "Minimal spreading" and "maximal spreading" may both 
be suboptimal among symmetric allocations in this gap; for 
example, x (n,r, m=[2rj) and x (n,T, ?7i=[3Tj) are the 
optimal symmetric allocations for {n,p,T) = (lO, ^, |) and 
(10, |, ^), respectively. In general, for any access probability 
p, the optimal symmetric allocation changes from "minimal 
spreading" to "maximal spreading" eventually, as budget T 
increases. This transition, which is not necessarily sharp, 
appears to occur at around T = -. Interestingly, when T — - 
exactly, we observe numerically that x (n, T, m= [TJ ) is the 
optimal symmetric allocation for most values of T; the optimal 



symmetric allocation changes continuously over the intervals 
1.5 < T <2and2.5 < r< 2.8911, while x(n,T,m=[2Tj) 
is optimal for 3.5 < T < 3.5694. These findings suggest that 
it may be difficult to specify an optimal symmetric allocation 
for values of p and T in the gap; we can, however, restrict our 
search for an optimal symmetric allocation to [y] candidates, 
as explained in the next section. 

III. Analysis 

To prove Theorem \T\ we need an upper bound for the 
probability of successful recovery for an optimal allocation 
(over all symmetric and nonsymmetric allocations): 

Lemma 1. The probability of successful recovery for an 
optimal allocation is at most 



r=0 



^,1 

n 



'[B{n,p) = 



Proof of Lemma Q} Consider a feasible allocation 
{xi, . . . ,Xn}', we have J2^=i^i ^ ^' where Xi > 0, 
i = l,...,n. Let Sr denote the number of r-subsets of 
{xi, . . . ,Xn} that have a sum of at least 1, where r G 
{1, . . . ,n}. By conditioning on the number of nodes accessed 
by the data collector, the probability of successful recovery 
for this allocation can be written as 



' [successful recovery] 



= E 



' [successful recovery | exactly r nodes were accessed] 
. P [exactly r nodes were accessed] 



r = l \r ) 



(1) 



We proceed to find an upper bound for Sr- For a given r, 
we can write Sr inequalities of the form x'l + ■ ■ ■ + x'^. > 1. 
Summing up these Sr inequalities produces an inequality of 
the form aixi + ■ ■ ■ + a„x„ > Sr- Since each Xi belongs to 



exactly I " j^ j distinct r-subsets of {xi, . . . , Xn}, it follows 



that < a, < 



Sr < aixi + ■ ■ ■ + a„Xn < 



i = 1, . . . , n. Therefore, 
'n-V 



:; E- 



< 



n-1 
r- 1 



T. 



Since 5*^ is also at most ("), i.e. the total number of r-subsets, 



we have 5*.^ < min ( ( "!-[ 1 T, (") 1. Substituting this bound 
into ([U completes the proof. ■ 

Proof of Theorem Q} The probability of successful 
recovery for the symmetric allocation x (n, T, m=n) is the 
probability of accessing at least [l/ (^)] — [f^] nodes, 
which is X]r=r-1 ^ ['^ ("-iP) = '']■ The suboptimality gap for 
this allocation is therefore at most the difference between its 
recovery probability and the upper bound of Lemma [U which 
is given by 



^ — ' n V r / 

rfi-2 



g )p(l-p)' 



as required. Assuming now that T > i, we have 



S{n,p,T)<pT 
= pT 
<pT exp 



B{n-l,p)< 



n-1 



T 



since < -^ + 1 — 2 

^ jn_ 1_ 

^ T T 



B{n~l,p)<—in~l)p 
pT 



in-l)p f J_ 
pT 



(2) 



Inequality (O follows from the observation that -^ € (0, 1), 
and the subsequent application of the Chernoff bound for 
deviation below the mean of the binomial distribution (see, 
for e.g., 1 8 1). For fixed p and T, this upper bound approaches 
zero as n goes to infinity. ■ 

Before proceeding with the proofs on the optimal symmetric 
allocation, we make a number of important observations 
about symmetric allocations in general. Successful recovery 
for the symmetric allocation x{n,T,m) occurs iff at least 
[l/ (^)] = \y] nonempty nodes are accessed. Therefore, 
the corresponding probability of successful recovery can be 
written as 



Ps{p,T,m) =P B{m,p) > 



T 



Given n, p, and T, we have [^] — k when 
me ((fc-l)r, fcT], for k = 1,2,..., [f J, and finally, 
[t1 = Lf J + 1 when m e ( [f.J T, n]. Since V[B{m,p)> k] 
is nondecreasing in m for constant p and fc, it follows that 
Ps {p, T, m) is maximized over each of these intervals of m 
when we pick m to be the largest integer in the corresponding 
interval. Thus, given n, p, and T, we can find an optimal 



m* that maximizes Ps{p,T,m) over all m from among [y] 
candidates: 

n 
-f 



{Lrj,L2Tj, 



T 



,n) 



(3) 



For ni ~ [kT\, where k e Z+, the corresponding probability 
of successful recovery is given by Ps {p, T, m— [kT\ ) = 
P [B ( [fcTj ,p) > k]. The difference between the probabilities 
of successful recovery for consecutive values of fc G Z+ can 
be written as 

A{p,T, k) ^ Ps {p,T,m=lik + 1)TJ) - Ps (p,T,m=lkT\) 

= P[B(L(fc + l)rj,p) >k + l]~¥[B{lkT\,p) > k] 

min(afc T-l,fc) 

= E 'P [^ ( Lfcrj ,p)^k-i]-r[B{ak,T,p)>i + l] 
1=1 
-¥[B {[kT\,p) = k] ■r[B{ak,T,p) = 0], 

where ak,T — [{k + l)T\ — [kT\. The above expression 
is obtained by comparing the branches of the probability 
tree for [fcPj vs [(fc + 1)TJ independent Bernoulli trials: the 
first term describes unsuccessful events ("B ( [fcTj , p) < fc") 
becoming successful ("Z?([(fc + l)Tj,p) > fc + 1") after the 
additional au.T trials, while the second term describes suc- 
cessful events ("£? ( [fcPj , p) > fc") becoming unsuccessful 
(";B ([(fc + l)T\,p) < fc + 1") after the additional ak,T trials. 
After further simplification, we arrive at 



A{p,T,k) =p''{l-p) 

E E 



L(fc+l)TJ-fe 



LfcTJ 
k — i 



Ctk,T 

3 



1-p 



LfcTJ 
fe 



(4) 

Lemma |2] essentially states a sufficient condition on p and 
T for A(p, T, fc) > for any fc e Z+, thereby eliminating all 
but the two largest candidate values for m* in (O: 



Lemma 2. IfT>2 such that 

(1 _ p) m ^ 2 [rjp(i - p) L^J -1 - 1 < 0, 



(5) 



then X (n, T, to=[[^JtJ ) or x.{n,T,m—n) is an optimal 
symmetric allocation. 

Proof of Lemma^ Suppose that T > 2. We will show 
that if condition (|5]l is satisfied, then A{p,T,k) > for any 
fc e Z+. First, we note that 



(o) 



k 



LfcTJ 

fc 



[feTj - fc + 1 



lHlT\+r)\ 
k 



k + 1 



where 

T^T-[T] 

e [0, 1) 



fc[Tj + [fcrj 



klT\- 
1 



[fcrJ < kr < k 
[kr] < fc - 1 
[fcrJ - fc + 1 < 

(6) 



Now, if condition (|5]l is satisfied, then 

(i-p)L^J +2[rjp(i-p)^^J-'-i<o 

^^P[Bi[T\,p) = 0] +2P[B(Lrj,p) = 1] - 1 < 
^^P[B(LTj,p) > 2] > P[B(Lrj,p) = 1] 



I T I 



IT 1-1 



i='2 

LTJ 



Observe that f{p,T) is decreasing wrt p for any T > 2 and 
P> ' 



^J_f[T\ 



i=2 

E 

J=2 






i-p 



i-p 



j-i 



> 1 



i-i 



> 1. 



(V) 



(8) 






1 



Observe that ak,T = [(^ + l)?"] - L^^J G {[TJ, [T]}, 
because a/c.T G (T— 1, r+ 1) and there are only two integers 
[T\ and [T] , which are possibly nondistinct, in this interval. 
It follows from © and ^ that 



SLTJ |[rj 2Lrj-i 

Now, consider the function 

9{T) ^f(p 



dp 



f{p,T)<0. 



m-i 



3 LTJ 






i-i 



Therefore, we have 



ak,T 



i=i j=i+i (^ fc j ^ •' 



> 1. 



i+J 



(9) 



1-p 



1 afc r ( LfcTJ 

> y^ y^ V '''' 

~ ^ ^ ( [fcTJ 






-i+j 



y- U-i^ 



j=2 
Qfc.T 



Qfc,T 
J 



1-p 



J-1 



mm(afc,T - l,fc) 
> min(2- 1,1) 

= 1 



Y^ 1 f ak,T 



p 



1-p 



i-p 

> 1, from 12} 

niin(Q:^ 'j' — l,k) ct^ T / 

i = l j = i + l ^ / \ J 

<;=^ A(p, T, fc) > 0, from &. 



from l6j 



1-p 



'i+j 



>-{'?) 



Thus, we conclude that Pg, {p, T, m~ [TJ ) < 
Ps{p,T,m=\2T\) < ••• < Ps(p,r,m=MfjTj), and 
so we can find an optimal m* from among | [|^J2^J , ?^}- ■ 
Theorem |2] restates Lemma |2] in a slightly weaker but more 
convenient form: 

Proof of Theorem^ Since ^ — [f] = 2 and 



T> 



4 
3^ 



Lrj> 



4 ■ 

ip 



4 4 

- 3p '^- 3[rj' 



it follows that if T > 



^ I , then T > 2 and p > ^^. We 



will show that condition @ of Lemma |2] is satisfied for any 
T > 2 and p > j^- To do this, we define the function 

!{p,T) 4 (1 -p)L^J + 2LTjp(l -p)L^J-i - 1, 

and show that /(p, T) < f (p = j^,t) < for any T > 2 

and p > jAtj. 

The partial derivative of f{p, T) wrt p is given by 

^/(p,r) = Lrj(i-p)L^J-2(i + p-2LTjp). 



3LTJ' y V 3LTJ, 

We will proceed to show that g{T) < for any T > 2. For 
T G [2, 3), we have [Tj = 2 and .g(T) = 0. To show that 
g{T) < for T > 3, consider the function 

/.(T)^(T-l)ln(l-±)+ln(^-± 



which has the derivatives 
1 



h'iT) -- 
h"{T) 



+ 



11 



ST -4 lir-4 



+ hi 1 



4 
ST 



16 (IIT^ - 24T - 16) 
r(33r2 -56T + 16)^' 



Since /i'(T = 3) = ^ - In | < 0, limT^oo h'{T) = 0, and 
h"{T) > for T > 3, it follows that h'{T) < for T > 3. 
Now, since /i(T = 3) = In f - 2 In | < 0, and h'[T) < for 
T > 3, it follows that h{T) < for T > 3. Thus, for T > 3, 
we have 



(Lrj-i)in 1 



3 LTJ 



+ ln 



In 



3 LTJ 



LTJ-l 



11 

Y 



SLTJ 
4 



h{[T\)<Q 



1- 



LTJ-l 



3[Tjy V3 3[rj 

Combining these results, we obtain 

4 



SLTJ 

< 1 



<0 



g{T) < 0. 



fip,T)<f[p 



3 LTJ 



,T 



?(T) < 



for any T > 2 and p > -^Arr- ■ 

Lemma |3] mirrors Lemma |2] by stating a sufficient condition 

on p and T for A{p,T,k) < for any k e Z+, thereby 

eliminating all but the smallest candidate value for ni* in (O: 



Lemma 3. IfT>l such that either 

1 



T 



ez^ 



p 



T<— and p{l~p) 
V 



- y 



1 

1 

T 



m-i 



(10) 



(11) 



f/ien X (n, T, TO=[rj) is an optimal symmetric allocation. 

Proof of Lemma ^ Suppose that T > 1. We will 
show that if condition (fTol l or condition (fTTT i is satisfied, then 
A{p,T, fc) < for any k G Z+. First, we note that for any 

ie{l,...,fc}. 






(fc)(fc- l)---(fc-i + l) 



{[kT\ - k + i) ■ ■ ■ {lkT\ -fc + 2)([fcrj -fc + l) 

i terms 

k 



< 



lkT\~k + lJ -\kT-l-k + lJ ' kT-l<lkT\ 

1 \" 

(12) 



^r-1 

Now, if condition (fTOl i is satisfied, then 



Z^ ^ ir- 1 



T-1 T 



s,s,(^y(i)( 



J J \l-P 

1 \ -i+j 



+3 



i—\ J — z + 1 
T-1 T 



J/ Vl-T 



i=l 3=8 + 1 ^ -^ / ^ 



E(^-i) 



r\ / 1 



T-1 



On the other hand, if condition ( fTTT i is satisfied, then 



1 V / [T] 



-!+i 



z^ z^ vr-iy V j yvi-p 

1 = 1 j = i + l ^ / \ J / \ -f- 

S .£i V ^- J \W^)) \jh 



fTl /^_1 



1-p 



m\f p 



i-p 



"SiSvp(r-i), 



Thus, if either condition is satisfied, we have 



fTi-i rri 



^ ,^. It- 1 



[Tl 



-i+j 






< 1 (13) 



< 1. (14) 



As in the proof of Lemma |2] we note that ak^T — 
[{k + 1)TJ - [kT\ e {[TJ, [T]}. It follows from ^ and 
dfUl that 



E E 



1 \ / ak.T 



1=1 J 
Therefore, we have 



-^ ^^ \T -\ \ j I \l-p 

=1 .7=i+i ^ / \ J y \ f 



-i+j 



i = l i=i + l ^ / \ J 

< 1, from (Bll 

-E E f ! T 

i = l i=i + l ^ / \ J 

<;=^ A(p, T, fc) < 0, from (O. 



1-p 



^+J since 

, min(afc_T-l,fe) 
< afc.T — 1 



1-p 



-i+j 



(T 



It follows that Ps,{p,T,m=[T\) > Ps {p,T,m=l2T\) > 
Ps {p, T, TO= [3TJ ) > • ■ • . Since 



Ps (p, T, m=n) 



= Ps(p,T,m=[[fjTj) 



< Ps P, T, m= 



IT 



if f G Z+, 
otherwise, 



we conclude that m = [Tj gives an optimal symmetric 
allocation. ■ 

Lemma |4] restates Lemma [5] in a slightly weaker but more 
convenient form: 

Lemma 4.IfT>l and p < j^ - ^, then x (n, T, m= \T\ ) 
is an optimal symmetric allocation. 

Proof of Lemma |?} We will show that either condition 
(ITOl l or condition (fTTT i of Lemma [3] is satisfied for any T > 1 
and p < yyy — y. We do this in two steps: First, we define 
the function 

p(i-p)m-i 



/feT)^ 



^(1-^)^"^-^ 



and show that /(p,T) < / (p = j|j - ^,rj < for any 

T > 1 and p < j^ — y. Second, we apply the appropriate 
condition from Lemma [3] for each pair of T and p. 
The partial derivative of f{p,T) wrt p is given by 

{l-p\T]){l-pr^-' 



d_ 
dp 



fiP.T) 



TJ 



Observe that f{p,T) is nondecreasing wrt p for any T > 1 
and p < 



p< 



smce 



1 



1 2 

T - Try ^ "fry 



1 



< 1. (15) 



\T] 
^pm < 1 ^^ 1 -pfTl > ^^ ^fiP^T) > 0. 

Now, consider the function 

5(T) A ^ fp ^ ^ _ i T^ = (#T-»)(i-#T + t)^"^ ' _ 

We will proceed to show that g{T) < for any T > 1 
by reparameterizing g{T) as h{c,T), where c= [T] and 

, C-1 



E E^"'^ 



/i(c,T)4 5(T = c-r) 



(i-^)(i-^+ ' 



C C — T 



i=l i=i+i I fe 

min(Qn. -r-l.fc) "fc,' 
< 

z— 1 j^i+1 



afc,T 



f^jL.JJ V J J\1'P 



-i+j 



— 1- — 

C— T \ C—T 



1^ 1^ \T - 1 



afc.T 



3 } \^-P 



-i+j 



The partial derivative of h(c,T) wrt r is given by 

2r2(c-2)(l-| + ^ 



, from idU 



—-h(c,T) = 

^^ {cic-l-r) + 2rf(l-^ 



Since -§:ph{c, r) < for any c e Z+, c > 2, and t e [0, 1), it 
follows that for any T > 1, we have 



5(T) = fe(c=rTl,r=[Tl 
< h{c= [T],r = 0) 



T) 



\ \n \n ) [} 



2 



+ 



\n 



r-ri-i 



1 

TtT 



(i-Tn) 



rri- 



-1 = 0. 



Combining these results, we obtain 

f{p,T)<f(p=^-^,T 



= 9{T) < 



for any T > 1 and p < -r^ — i 

1 



p(l -p) 



m-i 



T 



T 

which implies 

i^\ m-i 



<.:: 1-.:: 



T 






1 



1 < i- 



Finally, we apply the appropriate condition from Lemma [3] 
for each pair of T and p. For T G Z+,T > 1, we have 
yyy — y = y: we use condition ( fTOl i forp = y, and condition 
dlB for p < ^. For T ^ Z+, T > 1, we have 
we use condition (fTTT l for p < y . 

Theorem [3] expands the region covered by Lemma |4] by 
showing that x (n,T, to=[TJ) remains optimal between the 
"peaks" in Fig. |3] 

Proof of Theorem |5} Since 



T< 



m< 



1 
< - 



■^-M' 



it suffices to show that x (n,r, m=[rj) is an optimal sym- 
metric allocation for any T > 1 and p < -ryy. (The theorem 
is trivially true for T = 1.) We do this by considering 
subintervals of T over which \T~\ is constant. 

Let T be confined to the unit interval {c,c + 1], where 
c S Z+. According to Lemma |4] x (n,T, ?7i=:[Tj) is optimal 



for any p e ( 0, 
for any 



2 
c+1 



and T e (c, c + 1], or equivalently. 



peio, 



c+l 



and Te 



2 

c+l 



1 



P 



n(c, C + l] 



This is just the area below a "peak" in Fig. |3] expressed 
in terms of different independent variables. For each 
we can always find a Tq such that 



pe(0,^ 



Toe 

To = c 

e(0,i: 



c+l 



,c- 



p' 
1-5, 



1) n (c, c+ 1); for example, we can pick 



where 



" 2 



1 — max I c, -T 

c+l 



("■^)) 



Now, we make the crucial observation that if 
x{n,T,m—[T\) is an optimal symmetric allocation for 
T — Tq, then x (n,T, 7?t,= [TJ) is also an optimal symmetric 
allocation for any T £ [[Tq\ , Tq]. This claim can be proven by 
contradiction: the recovery probability for x{n,T,m—lT\) 
is Ps{p,T,m=lT\) = V[B{[T\,p)>l] which remains 
constant for all T E [[To\,To], and a symmetric allocation 
that performs strictly better than x{n,T,m—[T\) for 
some T G [[roJ,ro] would therefore also outperform 



x{n,T,m=[T\) for T = Tq. Since x(n,T,m=[Tj) is 
indeed optimal for our choice of Tq, it follows then that 
X (n, r, TO=[TJ) is also optimal for any pS (Oj^rrx) and 
T £ (c, c+ 1]. By applying this result for each c € Z+, we 
reach the conclusion that x{n,T,m—[T\) is an optimal 
symmetric allocation for any T > 1 and p < j^. 

Finally, to extend the optimality of x(n,r, m=[rj) to 
p — -rj^, we note that the recovery probability Ps{p, T, m) = 
¥ [B {m,p) > [-y]] is a polynomial in p and is therefore 
continuous at p = -ry-r. Since x (n,r, m=[Tj) is optimal as 

p — ;> 1^ , it remains optimal at p 
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